Search CORE

19 research outputs found

Policy Gradients for Probabilistic Constrained Reinforcement Learning

Author: Chen Weiqin
Paternain Santiago
Subramanian Dharmashankar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/04/2023
Field of study

This paper considers the problem of learning safe policies in the context of reinforcement learning (RL). In particular, we consider the notion of probabilistic safety. This is, we aim to design policies that maintain the state of the system in a safe set with high probability. This notion differs from cumulative constraints often considered in the literature. The challenge of working with probabilistic safety is the lack of expressions for their gradients. Indeed, policy optimization algorithms rely on gradients of the objective function and the constraints. To the best of our knowledge, this work is the first one providing such explicit gradient expressions for probabilistic constraints. It is worth noting that the gradient of this family of constraints can be applied to various policy-based algorithms. We demonstrate empirically that it is possible to handle probabilistic constraints in a continuous navigation problem

arXiv.org e-Print Archive

A Multi-Channel Neural Graphical Event Model with Negative Evidence

Author: Bhattacharjya Debarun
Gao Tian
Mattei Nicholas
Shanmugam Karthikeyan
Subramanian Dharmashankar
Publication venue
Publication date: 21/02/2020
Field of study

Event datasets are sequences of events of various types occurring irregularly over the time-line, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on non-parametric models that focus primarily on tasks such as prediction. We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multi-channel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive inter-event interval. We evaluate our method against state-of-the-art baselines on model fitting tasks as gauged by log-likelihood. Through experiments on both synthetic and real-world datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A computational architecture to address combinatorial and stochastic aspects of process management problems

Author: Subramanian Dharmashankar
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2001
Field of study

This thesis considers the problem of portfolio selection and task scheduling arising in research and development (R&D) pipeline management, where several projects compete for a limited pool of various resource types. Each project (product) usually involves a precedence-constrained network of testing tasks prior to product commercialization. If the project fails any of these tasks, then all the remaining work on that product is halted and the investment in the previous testing tasks is wasted. Further, there is significant uncertainty in the task duration, task resource requirement, task costs/rewards and task success probabilities. A two-loop computational architecture, Sim-Opt, which combines discrete event simulation and mathematical programming, has been developed by viewing the underlying stochastic optimization problem as the control problem of a performance-oriented, resource-constrained, stochastic discrete event dynamic system. Sim-Opt introduces the concept of a time line, which is a controlled, simulated trajectory that represents a specific combination of the realization of the various sources of uncertainty in the system. Multiple time lines are explored in the inner loop of Sim-Opt to accumulate information, which is subsequently used in the outer loop to obtain improving solutions to the system. Methods have been developed to integrate information from the inner loop with respect to portfolio selection and resource management. Industrially motivated case studies have been investigated using Sim-Opt to evaluate the effectiveness of different policies of operation, to evaluate the value of outsourcing of resources, and to obtain improving solutions in the outer loop. Basic algorithm and software engineering methods to achieve significant improvements in the performance of formulation generation and the generation of a heuristic lower bound along with identification of cut families for effective application of branch-and-cut methods for solution have been described. Lastly, the data complexity of the pipeline problem has been addressed by defining an XML-based structured input language for modeling the data needs in a formatted and extensible manner. This thesis demonstrates the benefit of explicitly viewing the R&D pipeline as the control problem of a discrete-event dynamic system and the effectiveness of Sim-Opt as a practical approach for addressing stochastic optimization

Purdue E-Pubs

GaSPing for Utility

Author: Bhattacharjya Debarun
Gu Mengyang
Subramanian Dharmashankar
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 03/04/2020
Field of study

High-consequence decisions often require a detailed investigation of a decision maker's preferences, as represented by a utility function. Inferring a decision maker's utility function through assessments typically involves an elicitation phase where the decision maker responds to a series of elicitation queries, followed by an estimation phase where the state-of-the-art for direct elicitation approaches in practice is to either fit responses to a parametric form or perform linear interpolation. We introduce a Bayesian nonparametric method involving Gaussian stochastic processes for estimating a utility function from direct elicitation responses. Advantages include the flexibility to fit a large class of functions, favorable theoretical properties, and a fully probabilistic view of the decision maker's preference properties including risk attitude. Through extensive simulation experiments as well as two real datasets from management science, we demonstrate that the proposed approach results in better function fitting

Association for the Advancement of Artificial Intelligence: AAAI Publications

Ordinal Historical Dependence in Graphical Event Models with Tree Representations

Author: Bhattacharjya Debarun
Gao Tian
Subramanian Dharmashankar
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 18/05/2021
Field of study

Graphical event models are representations that capture process independence between different types of events in multivariate temporal point processes. The literature consists of various parametric models and approaches to learn them from multivariate event stream data. Since these models are interpretable, they are often able to provide beneficial insights about event dynamics. In this paper, we show how to compactly model the situation where the order of occurrences of an event’s causes in some recent historical time interval impacts its occurrence rate; this sort of historical dependence is common in several real-world applications. To overcome the practical challenge of parameter explosion due to the number of potential orders that is super-exponential in the number of parents, we introduce a novel graphical event model based on a parametric tree representation for capturing ordinal historical dependence. We present an approach to learn such a model from data, demonstrating that the proposed model fits several real-world datasets better than relevant baselines. We also showcase the potential advantages of such a model to an analyst during the process of knowledge discovery

Association for the Advancement of Artificial Intelligence: AAAI Publications